Assessment of sociodemographic factors associated with time to self-reported COVID-19 infection among a large multi-center prospective cohort population in the southeastern United States

Objective We aimed to investigate sociodemographic factors associated with self-reported COVID-19 infection. Methods The study population was a prospective multicenter cohort of adult volunteers recruited from healthcare systems located in the mid-Atlantic and southern United States. Between April 2020 and October 2021, participants completed daily online questionnaires about symptoms, exposures, and risk behaviors related to COVID-19, including self-reports of positive SARS CoV-2 detection tests and COVID-19 vaccination. Analysis of time from study enrollment to self-reported COVID-19 infection used a time-varying mixed effects Cox-proportional hazards framework. Results Overall, 1,603 of 27,214 study participants (5.9%) reported a positive COVID-19 test during the study period. The adjusted hazard ratio demonstrated lower risk for women, those with a graduate level degree, and smokers. A higher risk was observed for healthcare workers, those aged 18–34, those in rural areas, those from households where a member attends school or interacts with the public, and those who visited a health provider in the last year. Conclusions We identified subgroups within healthcare network populations defined by age, occupational exposure, and rural location reporting higher than average rates of COVID-19 infection for our surveillance population. These subgroups should be monitored closely in future epidemics of respiratory viral diseases.


Introduction
The COVID-19 pandemic has created and widened existing health disparities within US society.Epidemiologic data collected during COVID-19 surveillance provides important insights on at-risk populations including those with differential access to information and infection control measures, in particular social distancing, masking, and new vaccines.Regional studies carried out by local U.S. healthcare and public health organizations, as well as national crosssectional studies [1][2][3], have found a higher incidence of COVID-19 illness in ethnic and racebased minority populations and certain age groups [4].Nonetheless, there is a paucity of comprehensive risk factor analyses beyond demographic characteristics that examine other factors such as occupation, self-reported health conditions, health behaviors, and household characteristics.
The COVID-19 Community Research Partnership (CCRP) was a multi-state cohort study designed to monitor the evolution of the pandemic in a large population with both syndromic surveillance and periodic testing for serologic evidence of infection.The goal of the CCRP was to generate data to inform ongoing public health responses to COVID-19 as well as future pandemics by recruiting a diverse cohort of patients and community members.The methods and purpose have been described elsewhere [5].
This large, multicenter study provided an opportunity to further examine population-based risk factors for COVID-19 to identify characteristics of subgroups at highest risk for becoming infected.As such, the purpose of this study is to investigate factors associated with time from enrollment in the study to a self-reported COVID-19 infection.Herein, we report findings of our risk factor analysis comparing hazard rates in subgroups based on individual and household characteristics.

Materials and methods
Participants were members of a prospective, multi-site, CCRP COVID-19 surveillance cohort study, a convenience sample of patients and healthcare workers in ten healthcare systems from the Mid-Atlantic and southeastern US.We recruited adults through patient portals or email from the following health systems and institutions: Wake Forest Baptist Health, Atrium Health, Wake Med, New Hanover Regional Medical Center, Vidant Health, Campbell University, Tulane University affiliated partner systems, University of Mississippi, University of

Participant data
Participants in the study completed daily online questionnaires about symptoms, exposures, risk avoidance behaviors related to COVID-19, and self-reported any recent positive SARS CoV-2 detection tests (herein described as "self-reported COVID-19").Respondents also reported their history of COVID-19 vaccination (date of receipt, product, dose 1 or dose 2, participation in a clinical trial).Race was defined based on responses to the initial study enrollment questionnaire, with options listed as 1) Black or African American, 2) Asian, 3) Hispanic or Latino, 4) White (not Hispanic/Latino), 5) American Indian or Alaskan Native, and 6) Mixed Ethnicity.Participants were invited to complete two supplemental online questionnaires, one that was focused on the individual and another that was focused on the individual's household, to provide more detailed information on demographic characteristics, occupations, self-reported health conditions, health behaviors, and household characteristics.Supplemental questionnaires were sent to all actively enrolled participants in May 2021, and subsequent newly enrolled study participants received the surveys within one month of starting the study.
We examined correlations between the occupational characteristics of participants and the primary outcome.The National Institute for Occupational Safety and Health (NIOSH), a United States federal agency responsible for conducting research and making recommendations for the prevention of work-related injury and illness, has characterized workplace exposure to SARS-CoV-2 in hundreds of non-health care occupations using metrics from O*NET, a national database with information on occupational characteristics, together with input from experts in occupational safety and health.Based on these data, occupations are categorized in three risk levels (i.e., high, medium, low) using the SARS-CoV-2 Occupational Exposure Matrix (SOEM) system [6].For the purposes of our study, SOEM exposure categories were based on three factors identified as contributing to increased risk of exposure in the workplace: whether an occupation involves routine in-person interaction with the public (Public Facing), working indoors (Working Indoors), and working in close physical proximity to others, either co-workers or the public (Close Proximity).Since health care workers were over-represented in the study population, the high exposure group was divided into two groups for data analysis, i.e., high-healthcare, and high-non healthcare.

Data analysis
Data from participants who completed both supplemental questionnaires were analyzed to determine demographic, occupational, health-related, and behavioral correlates of selfreported SARS CoV-2 infection.The analysis was done in R Version 4.3.3.Using the RStudio Desktop User Interface and relied on the following additional packages: 'survival', 'survminer,' and 'coxme.'P <0.05 defined statistical significance.Descriptive statistics were produced by cross tabulation and covariates for model inclusion were first checked for collinearity using pairwise correlation.Analysis followed a time-varying mixed effects Cox-proportional hazards framework, with a shared frailty at the level of recruitment site to account for homogeneity within each site/health network and to account for intraclass correlation in the outcome of interest (self-reported COVID-19 infection) within health networks.Hazard Ratios and 95% confidence intervals are reported as unadjusted estimates and as adjusted estimates for all covariates included in the final model (Table 2).Survival-curves are also presented using the Kaplan-Meier Product Limit estimator for selected covariates.
Two time-varying covariates were included in the categorical hazard analysis: 1) daily county level 7-day average COVID-19 incidence data published by the New York Times in 2020 and 2021 [7].and 2) COVID-19 vaccination status of participants.Participants were considered vaccinated after they reported receiving their first vaccination dose of any COVID-19 vaccine.

Results
A total of 69,714 participants were enrolled in the CCRP study.Of those, 42,701 (61.3%) completed the individual adult supplemental survey and 31,642 (45.4%) completed the household supplemental survey.Just under thirty thousand (29,973 [43.0%]) participants completed both the individual adult and household supplemental surveys.Participants that reported residing outside of Maryland, Virginia, Washington D.C., North Carolina, South Carolina, Mississippi, and Louisiana, as well as participants that were participating in clinical trials were excluded, leaving a total of 27,214 participants (39% of the total number of CCRP study participants) in the analytic population.The median follow-up time was 307 days with an interquartile range of 246 days (255-501).Self-reported COVID 19 was considered an event and all other survey responses were considered right censored.The earliest start date in the data is 2020-04-09 and the last follow up date is 2021-10-31.The shortest amount of follow-up time for a single participant was one day and the longest was 570 days.
Overall, 1,603 of a total of the 27,214 study participants (5.9%) reported that they had a positive test for SARS-CoV-2 between enrollment and the end of October 2021 (Table 1).The study population was predominantly female (71.4%) and White/non-Hispanic (88.3%;Table 1).Most participants lived in counties classified as urban (56.1%) and 96.9% had at least some college education with a large proportion holding graduate level degrees (47.9%).The study population was affluent (54.1% had a household income over $100,000).Data was not available for SOEM category designation for 32% of subjects, but the other participants were evenly distributed between the four SOEM exposure groups (low, medium, high-non healthcare, and high-healthcare).The networks with the largest numbers of participants included Wake Forest (32.6%),MedStar (26.7%), and Atrium (17.1%).
Adjusted and unadjusted hazard ratios for self-reported COVID-19 are shown in Table 2.Even though a higher proportion of women in the study population reported SARS-CoV-2 infections as compared to men (6.1% vs. 5.5%), the adjusted hazard ratio demonstrated lower risk for women (aHR = 0.87 [0.77-0.98]).When compared to the risk of self-reported COVID-19 among participants ages 18-34, all other age groups had a significantly lower risk for infection (vs.ages 35-49 aHR = 0.80; vs. ages 50-64 aHR = 0.63; vs. ages >65 aHR = 0.45).No difference in risk of COVID-19 was seen based on race or ethnicity.
In unadjusted analysis, participants living in urban counties had a lower risk of selfreported infection compared to those living in rural counties (HR 0.66 [0.58-0.75]),but this was attenuated and only borderline significant in adjusted analysis (aHR = 0.88 [0.77-1.00]).
Educational level was strongly associated with risk, with a significantly lower risk for participants with a graduate level degree (aHR = 0.57 [0.44-0.76])compared to those with no college education.As compared to participants who had seen their primary care provider within the past year, a lower risk was observed for those whose last primary care visit was 1-2 years ago (aHR = 0.84 [0.71-0.98])or 2-5 years ago (aHR = 0.68 [0.49-0.94]).Smokers had significantly lower risk of infection as compared to non-smokers (aHR = 0.73 [0.58-0.92]).Risk did not correlate with the number of self-reported health conditions, but there was a borderline significant increase in risk for participants with 3 or more chronic health conditions as compared to those who reported no chronic conditions (aHR = 1. in populations across the southern United States.After adjustment, we found that multiple social and economic factors were strongly associated with self-reported SARS CoV-2 infection during the pandemic.Risk of infection was significantly higher in young adult participants ages 18-34 years as compared to older groups, and the hazard ratios indicated that risk of infection as compared to the youngest participant group decreased further with each sequentially older age stratum, which is consistent with findings from other studies [8,9].This observation may be related to increasing concern about disease outcome in older age groups leading to greater adoption of preventative behaviors among older age cohorts.In contrast to the findings from other studies [10][11][12], we found no association between race or ethnicity and the risk of self-reported infection, a finding that may reflect the underrepresentation of minorities and/or the relative affluence of our study population.Participants with graduate level college education also demonstrated lower risk as compared to participants without college education, again consistent with other studies [13].Potential explanations for this observation include a lower likelihood of work-related contact with the general public and a greater awareness of risk and effective methods of protection among more educated subjects.The observation of lower risk among those whose last encounter with a primary care provider 1-5 years ago as compared to those who saw their provider within the past year could be due to greater awareness of COVID-19 from the health care provider, or to health care visits related to known or suspected COVID-19 illnesses.Counterintuitively, smokers appeared to have a lower risk of infection as compared to nonsmokers.This paradoxical relationship between smoking and risk of illness from COVID-19 has been observed in several other studies [14,15], including some that speculate that nicotine could mitigate the effect of the cytokine storm in COVID-19 patients [16].However, these studies also point out that smokers are underrepresented in many COVID-19 populationbased studies, as they appear to be in our study as well (5.4% of participants vs. the CDC estimated average of 12.4% for the southern United States in 2021) [17].Overall, the harmful effects of smoking are often cited as offsetting any possible benefit from smoking for COVID-19 related morbidity.
As expected, participants with a higher occupational risk of exposure to the general public demonstrated a coincident higher risk of infection, including those living in a household with an individual who attends in-person classes or who encounters the general public in their workplace.When risk was compared by occupational group using the NIOSH SOEM risk categories, the most significant increase in risk was from working in a healthcare setting.These results may be used to inform identification of high-risk groups for future respiratory disease outbreaks, allowing targeted programs to promote protective measures and behaviors that reduce the risk of infection.
Our study is subject to several limitations.COVID-19 cases were ascertained based on selfreported data from daily surveys, so the infections could not be independently verified.Participants varied in the frequency with which they responded to the daily surveys, which limits our ability to determine the exact date of any reported case of COVID-19.Generalizability is also limited due to the overrepresentation of lower risk socioeconomic groups in the study population and to recruitment that was limited to patients from healthcare networks and healthcare workers.As mentioned in the results section, this analysis was also limited by the percentage of participants (39%) who completed the supplemental questionnaires.
In contrast, the strengths of this study include a large sample size from a wide geographic area enrolled early in the pandemic and surveyed for a number of demographic, social, and economic characteristics.Our study population was recruited from 7 large health systems or networks that provide care through affiliated hospitals, clinics, and physicians in urban areas.This is the most common model for healthcare in the United States, where 69.7% of hospitals and 42.7% of physicians are in health systems, and 91.6% of hospital discharges are from system hospitals [18].While there are no data about the racial or age distribution of US adults covered by health systems versus.those who are not, data are available demonstrating that racial minorities are overrepresented among US adults without health insurance (in 2022, 20.9% of Hispanic adults, 10.4% of non-Hispanic blacks, and 6.4% of non-Hispanic whites were uninsured) [19].As our population was composed of individuals with health insurance from several networks in 5 states and the District of Columbia, we believe that it is similar to the populations of other US health networks, and our results are informative for identifying characteristics which can help in future pandemics for other US health system populations.
Our results validate findings from other studies and expand the body of evidence with several novel features.The categorical hazard analysis in this diverse study population is strengthened by employing two time-varying covariates: 1) county level 7-day average COVID-19 incidence data updated daily and published by the New York Times in 2020 and 2021 and 2) COVID-19 vaccination status of participants.The study also examined correlations between the occupational characteristics of participants and self-reported COVID-19 using metrics from O*NET, a national database with information on occupational characteristics developed by the National Institute for Occupational Safety and Health (NIOSH), categorized using the SARS-CoV-2 Occupational Exposure Matrix (SOEM) system.These results have important public health significance due to the size of the multicenter study population, the rigorous analytical approach, and the use of novel categorical metrics such as the SOEM system.Specifically, our results suggested a higher risk for men, those who do not have with a graduate level degree, healthcare workers, younger adults aged 18-34, those in rural areas, those from households where a member attends school or interacts with the public-all groups that are likely to be at higher risk in future pandemics and appropriate targets for increased efforts to prevent infections.
While our results illustrate the challenges inherent in understanding risk based on the complex interaction of age, race, and occupation and disentangling this from the risk due to local community infection rates, they also provide potential target groups for ongoing intervention.Under the assumption that those most at risk for first SARS-CoV-2 infection remain at higher risk for repeated infections, even in the changing risk environment from vaccine uptake and shifting perceptions of the necessity of mitigations, this study may provide insight into ways to reduce ongoing disparities from the pandemic through risk stratification and targeted interventions.
site selection and recruitment, form development, statistical approach) described in this article.

Table 1 .
(Continued) DiscussionOur multicenter COVID-19 Community Research Partnership study provided a unique opportunity to examine the effect of demographic factors on the risk of SARS CoV-2 infections * Values are all n (%) unless otherwise noted.No = Did not report COVID-19 infection during surveillance period; Yes = Did report COVID-19 infection during surveillance period https://doi.org/10.1371/journal.pone.0293787.t001

Table 2 .
(Continued) Adjusted HR are adjusted for Sex, Age, Race-Ethnicity, Urban-Rural, Education, Influenza Vaccine History, Time Since Last Primary Care Visit, Tobacco Use, Co-morbidities, Household Income, In-person class, Social Contact Occupation, COVID-19 Vaccination and SOME Risk Class.